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METHOD AND APPARATUS FOR REAL TIME 
MONITORING OF INDUSTRIAL ELECTROLYTES 

Priority Claim 

This application claims priority from commonly owned, copending United States 
Provisional Application Serial No. 60/397,120, filed 19 July 2002, the disclosure of which is 
hereby incorporated herein by reference. 

Field of the Invention 

The present invention relates generally to any electrolyte and methods for monitoring 
the constituents contained therein. More specifically, the present invention relates to plating 
baths and methods for monitoring the constituents contained therein based on chemometric 
analysis of voltammetric data obtained for these baths. More particularly, the method of the 
present invention relates to application of numerous chemometric techniques of modeling 
power, outlier detection, regression and calibration transfer for analysis of voltammetric data 
obtained for various plating baths. 

Description of Related Art 

Methods for analyzing electroplating baths 
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A typical plating bath solution comprises a combination of several distinct 
constituents which are broadly divided into major constituents and trace constituents. The 
major constituents typically make up about 2 to 50 percent of the total bath weight or 
volume. Trace constituents are present in smaller quantities, usually less than 1 percent of 
the total weight or volume. The techniques for the analysis of inorganic and organic 
constituents of plating baths usually appear separately in the literature. That is also the way 
they are briefly reviewed below. 

Methods for monitoring of organic constituents 

Haak et al. [PI, P2] have developed a method known as cyclic stripping voltammetry 
(CVS). They employed the effect of inhibition of the rate of deposition caused by 
adsorption of additives on the surface of a platinum rotating disk electrode during cyclic 
electrodeposition. Such inhibition is quantified by measuring the decrease of the anodic 
charge involved in the CV stripping peak. A decrease in deposited charge is correlated with 
an increase in concentration of the additives. CVS is the most commonly used technique 
today [P3]. Despite claim that CVS can be used as a monitoring tool (and the availability of 
the commercial CVS instrument), many serious questions about the technique still arise. 
The CVS method is not an analytical procedure as the term is generally understood: it is not 
specific for a given chemical compound, and the relationship between measured charge and 
solution concentration is not direct. The method does not measure a quantity that can be 
directly related to the concentrations of components of a known solution. Additionally, one 
quantity, a charge, is used to estimate the solution concentration of a multicomponent 
additive. In addition, CVS measures the aggregate effects of all of the additive components. 
For CVS monitoring to be useful, the ratios of the components of the additive system must 
remain constant as the additive is consumed. Some effort has been made to use the 
technique to determine the individual components of a multi-component additive [P4], but it 
is questionable whether such a procedure can be the basis of plating solution control. CVS 



is not suitable for continuous analysis of some baths due to contaminant buildup formation 
at the working electrode which affects adsorption of additives. 

Tench and White introduced a technique called Cyclic Pulse Voltammetric Stripping 
(CPVS) [P5]. This method involves sequentially pulsing the electrode between appropriate 
metal plating, metal stripping, cleaning, and equilibrium potentials whereby the electrode 
surface is maintained in a clean and reproducible state. This method overcomes the problem 
of contaminant buildup in the copper plating bath affecting the copper deposition rate which 
interferes with bright ener analysis. 

An improvement of CVS and CPVS method is found in [P6]. In accordance with the 
invention, in order to prevent contaminant buildup on the electrodes, a pause without applied 
potential is used following each completed cycle. During either this applied potential or the 
open circuit condition, contaminants are either eliminated from the electrode surface or fail 
to deposit on the surface. 

Eliash [P7] demonstrated an in-situ method involving applying a brief voltammetric 
plating signal to a pretreated electrode, applying a rapid stripping signal to the plated 
electrode, and monitoring the resultant stripping signal response current whose 
characteristics indicate the particular trace constituent concentration level. 

Sonnenberg et al. [P8] have developed a direct method of analyzing brighteners and 
levelers based on the differential adsorption of these additives on a working electrode during 
a sequence of steps prior to and during metal plating. The sensitivity of the method allows 
for the determination of both brightener and leveler in the same sample without cyclic 
processing. 

Chang et al. [P9] have developed a cyclic voltammetric(CV) method for measuring 
the concentration of an unknown subcomponent in the additive mixtures in a plating 
solution. The performance of the method is demonstrated using the example of an acid 
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copper plating bath. The method is based on measurements of cathodic copper plating 
charge for different volumes of added unknown mixture to the calibration solution which 
contains the component of interest in a known concentration near that which would be 
expected in the unknown. The slopes of the calibration standard curve and the unknown 
mixture curve are also compared. 

All methods presented above require a rotating disk electrode and controlled bath 
hydrodynamics. 

Chang et al. [P10] have developed a method for analyzing organic additives in 
methane sulfonic acid based solution for electroplating of Pb-Sn alloys. The method is 
based on standard addition measurements of the height of the peak of square wave 
voltammograms obtained at a hanging mercury drop electrode (HMDE). The major 
drawback of this method is the use of mercury electrodes, which create environmentally 
dangerous waste and need to be operated and maintained by highly qualified personnel. 

Ludwig [PI 1] developed a method based on AC voltammetry by measuring AC 
current in relation of varying dc potential to express it as an AC current spectrum (or 
fingerprint). The spectra obtained contain fine structure and enable monitoring of minor 
plating bath constituents. AC voltammetry was utilized to monitor organic additives in-situ, 
without any sample preparation and/or utilization of standard solutions. 

Bonivert et al. [P12, P13] have developed an in-situ electrochemical detection 
method, which employs a Tuned Frequency Impedance Probe (TFIP), to measure dilute 
concentrations of surfactants in plating solutions. Current due to a modulation voltage flows 
from the counter electrode through the increased resistance at the working electrode. The 
increased resistance at the working electrode causes the phase of the voltage applied to the 
inverting input of an amplifier to lag with respect to the phase of the modulation voltage. 
The phase of the output voltage from the amplifier is compared to the phase of modulation 
voltage using a lock-in amplifier. The result of the comparison, the phase difference, is 
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output as a voltage signal from the amplifier to a utilization device. This voltage correlates 
directly to the surfactant concentration adsorbed on working electrode. 

A quantitative analytical technique, chromatography, is available for some of the 
components of some electroplating solutions [PI 4]. HPLC has the potential advantage of 
being able to detect individual ionic components of the additive in the plating bath. 
However, analysis methods and separation columns are not available for many of the 
commercial additives on the market today. Also, some additives may require sample 
preparation before HPLC analysis can be performed. Additionally, the aggressiveness of the 
bath samples limits the lifetime of chromatographic columns to several hundred analysis 
increasing therefore maintenance costs. 

Newton and Kaiser [PI 5] presented current developments on applications of liquid 
chromatography techniques for determination of additive concentrations and contaminant 
analysis. They also discussed increasing requirements (mostly setup by the semiconductor 
industry) for the purity, plating effectiveness and plating speed of electroplating bath 
chemicals. 

Horkans and Dukovic [PI 6] conducted a comparative study on determination of 
concentration of SPS-based additives in copper plating baths using CVS and HPLC. They 
noticed that although CVS due to its convenience is more common than HPLC for 
integration with plating tools, it is not a selective technique (in contrast to HPLC) for 
suppressor concentration determination. All species (both these deliberately added and 
degradation products) similarly affecting Cu deposition kinetics are lumped together in the 
CVS determination of concentration. They also noticed that CVS and HPLC methods agree 
in SPS analysis only in standard solutions or in unused plating baths. 

Methods for monitoring of inorganic constituents 



Techniques for monitoring the major constituents of plating baths typically involve 
removing a sample of the chemical solution from the plating tank for subsequent wet 
chemical analysis. Wet chemical analysis methods must usually be performed by highly 
skilled personnel. Specialized and costly chemical analysis equipment and supplies are 
required. Furthermore, the delay between drawing samples and receiving measurement 
results can be anywhere from several hours to several days. The slow response time of wet 
chemical analysis limits the extent to which a high quality and high-speed plating bath can 
be continuously maintained. 

Another off-line method applied in the analysis of metals in the plating bath is X-ray 
fluorescence. This method is very precise and competitive to wet chemical techniques in 
terms of accuracy, especially for metals that lack reliable wet chemical methods. 
Unfortunately, X-ray fluorescence shares all the disadvantages of wet chemical methods 
discussed in the previous paragraph as well as the high cost of the equipment, 

On-line methods for major constituents have been developed, and are routinely used 
despite their high cost and inconvenience in that often the solution must be pumped out of 
the plating tank into equipment of substantial size and complexity. Sometimes reagent 
solutions are automatically mixed with the pumped solution. Usually there is no room on a 
plating floor for close proximity of such equipment. Also, the complexity of the automatic 
solution mixing and preparatory analytical steps results in low reliability (due to, for 
instance, reagent instability and rinsing cross contamination) and high cost. In addition, and 
perhaps of paramount importance, is that the methods and equipment are not universal in 
application, and therefore cannot be used for all the plating tanks in the plating shop. 
Methods included in these real-time, but low practicality procedures are ion- 
chromatography, differential pulse polarography (DPP), cyclic linear sweep voltammetric 
stripping (CVS), optrodes, and UV fluorescence. 

Eliash et al. [PI 7] have developed the method of monitoring in-situ and on-line 
metal ion content. The method involves applying a sweep signal to the pretreated working 
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electrode, and measuring the DC voltammetric peak current of the resulting response signal. 
The DC voltammetric peak current is proportional to the metal ion content of the plating 
bath. 

Phan et al. [PI 8] have developed a method based on DC-and-AC voltammetry for 
real-time in-situ monitoring of major constituents in plating baths. The concentration of 
major constituents is determined based on the peak current of DC-and-AC voltammograms. 

Ludwig et al. [PI 9] have developed a method of monitoring acid concentration in 
plating baths. The AC response current provides an accurate indication of the acid 
concentration within the solution- 
Application of chemometric techniques in electrochemistry 

Routine applications of chemometric methods abound in the literature of analytical 
chemistry, but only a small fraction of this literature has been devoted to the field of 
electrochemistry [LI]. Although the number of groups employing chemometric methods in 
electrochemistry has been limited, there has been some good progress made by them. A 
brief overview appears below of a selection of chemometric methods used in novel ways in 
the field of electroanalytical chemistry, which have appeared throughout the last ten years. 

Calibration and Resolution 

Calibration refers to the process of relating the analyte concentration or the measured 
value of a physical or chemical property to a measured response. 
This section is also partially concerned with the mathematical resolution of mixtures. A 
mathematical resolution of mixtures is usually performed in far less time than a physical or 
chemical separation. 



Henrion et al. [L2] reported application of Partial Least Squares (PLS) regression to 
resolve quantitatively overlapping responses obtained from differential pulse anodic 
stripping voltammetry (DPASV). 

Ni et al. used PLS and Principal Component Regression (PCR) [L3] and iterative 
target transformation factor analysis (ITTFA) [L4] to resolve the overlapping polarograms 
of organic compounds, pyrazine and its methyl derivatives. Ni et al. also applied PLS and 
PCR [L5] and ITTFA [L6] to resolve the voltammograms of quaternary mixture of 
Amaranth, Sunset Yellow, Tartrazine and Ponceau 4R which present overlapped peaks. Ni 
et al. [L7] employed PLS and PCR to resolve overlapping linear sweep voltammetric (LSV) 
peaks of oxidation obtained for quaternary mixture of synthetic food antioxidants: butylated 
hydroxyanisole, butylated hydroxytoluene, propyl gallate and tert-butylhydroquinone at a 
glassy carbon electrode. Ni et al. [L8] also used the same chemometric techniques for 
interpretation of complex differential pulse stripping voltammograms of antipsychotic drugs 
chlorpromazine hydrochloride and promethazine hydrochloride obtained at a glassy carbon 
electrode, 

Alonso Lomillo et al. [L9] employed PLS regression for the resolution of the 
overlapping DPP signals from a ternary mixture of drugs: rifampicin, isoniazid and 
pyrazinamide. The authors applied genetic algorithm to select some of the predictor 
variables (potentials of the polarogram). 

Alius and Brereton [L10] used a chemometric approach to linear calibration to 
determine thallium in cement dust and sediment samples using anodic stripping 
voltammetry. 

Reviejo et al. [LI 1] applied PLS regression to polarographic analysis of emulsified 
mixtures in any combination of four organochloride pesticides, using a calibration set of 35 
samples, with current measurements at nine different potentials. 



The study of Jagner et al. [LI 2] demonstrates that there are significant advantages to 
be gained by using multivariate calibration in electroanalysis of systems with several 
interfering components. They were able to determine arsenic by stripping analysis in the 
presence of multiple interfering species that, with the conventional univariate calibration 
methods used by most electrochemists, would have rendered the analysis useless. 
The abilities of the PLS in the resolution of binary and ternary mixtures of organic 
compounds by using their DPP signal were reported by Cabanillas et al. [L13,L14,L15]. 
The PLS-1 method was found by Guiberteau et al. [L16] to provide satisfactory calibration 
for indirect differential pulse voltammetric (DPV) determination of the carbonate pesticides: 
carbaryl and carbofuran. The same group used PLS to calibrate sampled direct current, 
DPV and cyclic voltammetric (CV) data for binary and ternary mixtures of phenolic 
antioxidants used in the food industry [L17], The calibration was externally validated on 
packet soup samples. Guiberteau Cabanillas et al. [LI 8] utilized PLS and artificial neural 
networks to determine each component in the following binary mixtures: atrazine-simazine 
and terbutryn-prometryn based on their overlapping polarographic signals data. Lastres et 
al. [LI 9] and Chan et al. [L20] applied neural nets to calibration problems in solving 
interference caused by the formation of intermetallic compounds in anodic stripping. 

Richards et al. [L21] demonstrated optimization of a neural network model for the 
calibration of dual pulse staircase voltammetric data for a ternary aliphatic mixture of 
ethanol, fructose and glucose. In order to reduce training time, the number of network 
inputs was reduced by application of PC A and data scores instead of original data were used 
as input. 

Wehrens and van der Linden [L22] employed neural networks to calibrate a 
voltammetric sensor consisting of an array of modified microelectrodes. Linear calibration 
methods, like PCR, did not yield good results because of the inherent non-linear nature of 
the LSV data for mixtures of ortho-, meta-, and para-dinitrobenzene, and monosubstituted 
nitrobenzene. Matos et al. [L23] conducted flow injection amperometric quantification of 
ascorbic acid, dopamine, epinephrine and dipyrone in mixtures by using an array of 
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modified microelectrodes. The experimental results were analyzed using multiple linear 
regression technique. 

In numerous papers coming from Esteban's group, factor analysis techniques were 
applied to the electroanalytical study of metal ion interactions with macromolecular ligands 
such as polycarboxylates, yielding slow mobile complexes [L24], cysteine-containing 
peptides yielding very strong complexes with heavy metals [L25-L31], monomeric weak 
complexing agents, such as carboxylates, yielding consecutive labile complexes with low 
formation constants [L32], strong complexing ligands, such as nitrilotriacetic acid (NTA), 
which yield 1 : 1 metal complexes showing either labile or inert characteristics depending on 
the different time window of the technique used [L33]. The major part of these studies was 
performed by DPP, because of its high resolution, although DPAS V and normal and reverse 
pulse polarographic techniques were also used. Metal-binding properties of the peptides 
were studied on the example of cadmium complexes analyzed with LSV [L34] and CV 
[L35] which are considered to be the most effective and versatile electroanalytical 
techniques. These, however, have a drawback connected with poor resolution of 
overlapping signals. DPP and direct current polarography techniques were employed in the 
study of three successive Zn-glycine complexes [L36], the first two being electrochemically 
labile and the third one being inert. In all cases discussed in this paragraph, multivariate 
curve resolution with alternating least squares (MCR-ALS) was used. Diaz-Cruz et al. 
[L37] demonstrated the potential usefulness of voltammetry in combination with hard- and 
soft-(MCR-ALS)-modeling data analysis for the study of peptide complexation equilibria of 
metal ions such as Zn which have neither relevant spectroscopic properties nor proper 
isotopes for NMR measurements. Fernandez et al. [L38] showed that a soft modeling 
approach for the voltammetric data analysis for labile Cd - and Pb - glycine complexes 
provides good estimations of the complexation parameters as verified by the classical 
DeFord-Hume method. Soft modeling proved also useful for analysis of complex 
polarographic data applied to the study of the copper-binding ability of tannic acid in the 
presence of simultaneously occurring phenomena such as electrodic adsorption, overlapping 
signals or stabilization of intermediate Cu(I) species [L39]. Esteban et al. [L40] presented a 
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general overview of the application of the MORTALS method to metal complexation studies 
by voltammetric techniques, mostly by DPP. Diaz-Cruz et al. [L41] employed MCR-ALS 
for analysis of DPP signals measured for systems Zn 2+ +glutathione and Cd 2+ +l,10- 
phenanthroline. These systems, respectively, yield two and three successive and 
electroactive complexes, which are inert in the time scale of electrochemical experiment. 

Berzas et al. [L42] compared the applicability of two multicomponent analysis 
methods, square wave voltammetry by PLS and adsorptive stripping square wave 
voltammetry by PLS, to the resolution of overlapping reduction peaks corresponding to the 
reduction processes of sulphamethoxypyridazine and its synergetic potentiator, trimethoprim 
to conclude that the stripping of adsorbed species proved to be more sensitive. 

Saurina et al. [L43] employed PCR and PLS for calibration calculation of the CV 
data for a mixture of oxidizable amino acids (cysteine, tyrosine and tryptophan) at a 
graphite-methacrylate composite electrode obtaining satisfactory results for cysteine and 
tryptophan. 

Herrero and Cruz Ortiz [L44] used the piecewise direct standardization (PDS) 
method for PLS calibration model transfer in order to incorporate the temporal changes of 
the system due to formation of numerous intermetallic compounds affecting the 
polarographic determination of copper, lead, cadmium and zinc. The same authors [L45] 
applied PLS regression to the simultaneous determination of thallium and lead by DPASV. 
In this paper Herrero and Cruz Ortiz [L45] used PDS in order to transfer the calibration 
model from one day to another. Herrero and Cruz Ortiz [L46] employed PLS regression to a 
calibration problem where, in addition to electrode reactions that give the DPP peaks, a 
coupled chemical reaction, dimerization, coexists. The investigated component was 
benzaldehyde. The same authors [L47] employed the PLS regression in order to solve the 
significant matrix interference caused by iron in the copper determination by DPASV. 
Application of two standardization procedures, PDS and global calibration transfer was also 
demonstrated in this paper [L47]. 
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Herrero and Cruz Ortiz [L48] applied a genetic algorithm as a variable selection 
method in the multivariate analysis with PLS regression of several DPP and DP AS V data 
sets, where various interferences are present (coupled reactions, formation of intermetallic 
compounds, overlapping signals and matrix effect). 

Sanz et al. [L49] developed a procedure for determining the capability of 
discrimination and evaluated this procedure using PLS calibration of benzaldehyde 
calculated based on DPP data. 

Signal Processing 

Signal processing is a discipline of chemometrics that is concerned with 
manipulation of analytical data to make the information contained in the data more 
accessible. 

Theoretical studies of the Fourier transform of voltammetric peaks, waves, and 
reversible LSV curves have been undertaken by Engholm [L50,L51]. Simons et al. [L52] 
employed Legendre polynomials for data reduction and noise filtering of amperornetric 
signals. Four signal processing techniques: moving average smoothing, polynomial 
smoothing, rectangular low-pass filtering and exponential low-pass filtering were compared 
for use in potentiometric stripping analysis. Rectangular low-pass filtering was the most 
effective technique in enhancing the resolution of overlapping peaks [L53]. Stripping 
voltammetry data were subjected to signal processing such as background subtraction, 
ensemble averaging, digital filtering in the time and frequency domains, multiple scanning, 
and deconvolution [L54]. Signal processing methods: finite impulse response (FIR) and 
infinite impulse response (IIR) filters were employed for signal-to-noise ratio enhancement 
[L55]. The moving median filter was applied to potentiometric data. It removed the outliers 
without significant distortion of the signal while enhancing the signal-to-noise ratio [L56]. 
Zhou and Mo [L57] applied B-spline wavelet multifrequency channel decomposition for 
signal processing in the LSV. Zheng and Mp [L58] used B-spline wavelet coupled with 
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Riemann-Liouville transform for signal processing in the staircase voltammetry. Chow et 
al. [L59] employed Fourier techniques for signal filtering of potentiometric stripping 
analysis data. 

Expert Systems 

Expert systems are a relatively large area of application of chemometric techniques 
in electrochemistry. An expert system is a method of classification which is a simple 
hierarchy of user-defined rules that are used to evaluate the data. An expert system 
translates a heuristic method into a decision tree that can be implemented to automate the 
analysis of data for a particular problem. 

Palys et al. [L60-L63] applied knowledge-based system to the voltammetric 
eluicidation of electrode reaction mechanism. The expert system designs experiments, 
controls the voltammetric or coulometric run, and collects data for each of the experiments 
used in the automated mechanism elucidation. Esteban and co-workers [L64-L69] 
developed an expert system for voltammetric determination of trace metals, which guides 
the user on choice of sample treatment and the best choice of voltammetric procedure. 
Provision is made for identification and resolution of overlapping peaks and quantification 
by means of the multiple standard addition method with statistical validation test. Garcia- 
Armada et al. [L70] developed a knowledge-based system for DPP. A database of 
information about possible constituents of the system to be studied can be processed to 
facilitate the best approach for simultaneous multielement analysis with maximum 
efficiency, interpret the resulting data, and identify the constituents of the sample. 

Summary of the Invention 

The present invention relates to application of numerous chemometric techniques of 
design of experiment (DOE), modeling power, outlier detection, regression and calibration 
transfer for analysis of voltammetric responses obtained from various plating bathes. A 
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novel parameter obtained by multiplying modeling power by squared least-squares 
regression coefficient proves to be a useful tool for determining the optimal part of a 
voltammogram taken for calibration calculations. Several methods were demonstrated for 
outlier detection within the training set to be applied prior regression calculation. The 
techniques for determining the optimal number of factors for regression calculation were 
presented. These techniques, while iteratively coupled with numerous discussed methods of 
outlier detection within the training set by regression calculation, can produce an outlier free 
training set to be used for final calibration calculations. 

It has been demonstrated that multivariate regression methods can create a robust 
calibration model based on data that are virtually useless for univariate regression methods. 
It has been discovered that by combining into one data file data obtained using different 
techniques one may create a more accurate calibration model than that calculated for any 
single technique. The novel method is based on "gluing" parts of different voltammograms 
(but obtained for the same solution) prior decomposition and multivariate regression 
calculation. Powerful chemometric regression techniques provide robust, multivariate 
calibration that can be reliably transferred from the primary instrument to secondary 
instruments. Data sets passing outlier detection tests are being used for regression 
calculations. The information obtained about the concentration of deliberately added bath 
constituents can be used to maintain the desired constituent concentrations within limits in 
order to ensure optimal plating bath performance. 

Brief Description of the Drawings 

Figure 1 shows an example of a cyclic DC voltammogram scan dq21b26, ch2 for 
CUBATH® ViaForm™ (Enthone) copper plating bath. 

Figure 2 shows the squared correlation coefficients (Equation 7), r 2 , for self- 
predicted, autoscaled tin concentrations obtained via least squares regression for each point 
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of the voltammogram, variable j, (scan dq21xl0, channel 2). The modeling power 
(Equation 1 1), R, calculated for same data. 

Figure 3a shows PRESS (Equation 25) calculated for various numbers of factors for 
self-predicted by PCR and PLS-1 brightener concentrations (scan dq21cu> channel 2, range 
670-765). 

Figure 3b shows PRESS (Equation 25) calculated for various numbers of factors for 
self-predicted by PCR and PLS-1 carrier concentrations (scan dq21s4, ch 5, range 440-470). 

Figure 4a shows PRESS (Equation 25) calculated for various numbers of factors for 
cross- validated by PCR and PLS-1 brightener concentrations (scan dq21cu, channel 2, range 
670-765). 

Figure 4b shows PRESS (Equation 25) calculated for various numbers of factors for 
cross- validated by PCR and PLS-1 carrier concentrations (scan dq21s4, ch 5, range 440- 
470). 

Figure 5 shows Fpress (Equation 28) calculated for various numbers of factors for 
cross-validated brightener concentrations (scan dq21cu, channel 2, range 670-765) and 
carrier concentrations (scan dq21s4, ch 5, range 440-470). 

Figure 6a shows Exner *F function (Equation 29) calculated for the same 
concentration data as that of Figure 4a. 

Figure 6b shows Exner ¥ function (Equation 29) calculated for the same 
concentration data as that of Figure 4b. 

Figure 7 shows a plot of leverages versus externally Studentized concentration 
residuals for brightener (scan dq21ba2, channel 5, range 300-860, 4 factors) 
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Figure 8 shows an example cyclic AC (X first harmonic) voltammogram scan 
dq21b26, ch3 for PC75 plating bath. 

Figure 9 shows actual (diamond) and cross validated by PCR (square) and by PLS-1 
(triangle) acid concentration values for PC75 plating bath calibration; scan dq21b26, ch 3, 
range 4000-4800, 3 factors. 

Figure 10 shows the squared correlation coefficients (Equation 36), (r 0 ) 2 . for self- 
predicted brightener concentrations. PC75 plating bath calibration obtained via least squares 
regression for part of the scan dq21ba2, channel 3, range 401-701 (first 301 points) glued 
with scan dq21ba2, channel 4, range (301-601) (last 301 points). 

Figure 1 la shows prediction of acid concentration on the secondary instrument 
calculated employing regression equation from the primary instrument without any 
standardization (scan dq21b26, channel 3, 3600-4350, 4 factors). 

Figure 1 lb shows prediction of acid concentration on the secondary instrument 
calculated employing regression equation from the primary instrument standardized with DS 
(scan dq21b26, channel 3, 3600-4350, 4 factors). 

Figure 11c same as Figure 1 lb but standardized with DSB. 

Figure 1 Id same as Figure 1 lb but standardized with PDS. 

Figure lie same as Figure 1 lb but standardized with PDSB. 

Figure 1 If same as Figure 1 lb but standardized with DSS. 

Figure 1 lg same as Figure 1 lb but standardized with DSBS. 
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Detailed Description of the Invention 

In accordance with the present invention, apparatus and a method for voltammetric 
analysis of the plating bath are provided. Analysis includes preliminary examination of the 
voltammograms for any disturbances in the bath performance and following quantitative 
determination of concentrations of all deliberately added bath components. 

DC-AC Voltammetric techniques have been used for monitoring concentrations of 
bath constituents before, however the analysis of the voltammograms has been based on the 
single point usually corresponding to the peak current. This type of analysis is much less 
accurate and less reliable than chemometric analysis applying PCR or PLS methods which 
are used in the method described here. Several methods for qualifying the voltammograms 
prior to using them for prediction calculations of constituents' concentration are presented. 
These methods are able to detect changes in the shapes of voltammograms reflecting either 
changes in the bath composition (due to, for instance, contamination or concentrations of 
constituents being out of calibration range) or conditions under which the bath is running 
(for instance, a different temperature). All these reasons may impede the performance of the 
plating bath and therefore should be detected as soon as possible to enable the operator to 
stop plating and correct them before running further plating of, for instance, expensive 
materials like silicone wafers for the electronic industry. 

The method of the present invention involves the steps of applying a changing in 
time potential to a working electrode in contact with the plating bath solution, and 
measuring the response signal. The characteristics of the response signal vary in accordance 
with the concentrations of constituents within the solution, and thereby provide an accurate 
real-time indication of concentrations of constituents. 

In accordance with a preferred embodiment of the present invention, an AC signal 
superimposed on a DC sweep signal is applied to a working electrode which has been 
pretreated by a DC potential and is in contact with the plating bath solution. The DC sweep 
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signal is varied at a selected sweep rate over a selected voltage range. An AC response 
current signal is thereby produced which includes peaks indicative of the concentration 
levels of constituents within the plating bath. The method establishes a set of optimal 
electrochemical parameters for an exemplary plating bath and its respective constituents. 

As a feature of the present invention, the method eliminates the delay, expense and 
complexity typically associated with analysis methods requiring wet chemical analysis. 
Specialized chemical equipment and chemical analysis personnel are no longer required. 
The measurement results are available in real time, which facilitates continuous and efficient 
control of plating bath chemistry. 

The above-discussed features and attendant advantages of the present invention will 
become better understood by reference to the detailed description of the preferred 
embodiment and the accompanying drawings. 

Unless otherwise stated, computations were done using the Matlab Ver. 6.0 
environment (The Math Works, Inc., Natick, MA) with the PLSToolbox Ver. 2.1.1 
(Eigenvector Research, Inc., Manson, WA). 

Experiment design and data description 

The plating bath consists of several components, both inorganic and organic, whose 
concentrations should be maintained within ranges recommended by the bath manufacturer 
in order to assure its satisfactory plating performance. The calibration of the probe for 
analyzing the plating bath should provide maximum information about the bath behavior for 
possibly many concentration combinations within specified ranges. 

In order to assure possibly uniform distribution of concentration combinations within 
calibration ranges it was decided to apply linear orthogonal array for the experiment design. 
The chosen linear orthogonal array consists of 25 rows (which correspond to solutions of the 
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training set) distributing concentrations of 5 or 6 bath components on 5 different levels. The 
example of linear orthogonal array designed for six-component Enthone CUBATH® 
ViaForm™ bath is shown in Table 1 . 



TABLE 1. Composition of calibration solutions for copper plating bath calculated as 5- 
level-6-component-25-row linear orthogonal array 



Solution # 


Copper 


Acid 


Chloride 


Accelerator 


Leveler 


Suppressor 


g/L 


g/L 


ppm 


mL/L 


mL/L 


mL/L 


1 


14 


140 


20 


1 


0.5 


5 


2 ' 


14 


155 


33.75 


1.6 


1.1 


6.3 


3 


14 


170 


47.5 


2.3 


1.75 


7.5 


4 


14 


185 


61.25 


2.9 


2.4 


8.8 


5 


14 


200 


75 


3.5 


3 


10 


6 


15.5 


140 


33.75 


29 


3 


7.5 


7 


15.5 


155 


47.5 


3.5 


0.5 


8.8 


8 


15.5 


170 


61.25 


i 


1.1 


10 


9 


15.5 


185 


75 


1.6 


1.75 


5 


10 


15.5 


200 


20 


2.3 


2.4 


6.3 


11 


17 


140 


47.5 


1.6 


2.4 


10 


12 


17 


155 


61.25 


2.3 


3 


5 


13 


17 


170 


75 


2.9 


0.5 


6.3 


14 


17 


185 


20 


3.5 


1.1 


7.5 


15 


17 


200 


33.75 


1 


1.75 


8.8 


16 


18.5 


140 


61.25 


3.5 


1.75 


6.3 


17 


18.5 


155 


75 


1 


2.4 


7.5 


18 


18.5 


170 


20 


1.6 


3 


8.8 


19 


18.5 


185 


33.75 


2.3 


0.5 


10 


20 


18.5 


200 


47.5 


2.9 


1.1 


5 


21 


20 


140 


75 


2.3 


1.1 


8.8 
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22 


20 


155 


20 


2.9 


1.75 


10 


23 


20 


170 


33.75 


3.5 


2.4 


5 


24 


20 


185 


47.5 


1 


3 


6.3 


25 


20 


200 


61.25 


1.6 


0.5 


7.5 



The typical concentration ranges for copper, acid, chloride, accelerator, leveler and 
suppressor are 14-20 g/L, 140-200 g/L, 20-75 ppm, 1.0-3.5 mL/L, 0.5-3.0 mL/L and 5-10 
mL/L, respectively. Prior to the calibration 25 solutions were prepared according to the 
concentration values in Table 1. Each of these solutions was electroanalyzed twice by 
recording a set of voltammograms. 

The data of the training set consists of independent variables, voltammograms, and 
dependent variables, concentrations corresponding to the voltammograms. The number of 
independent variables, which corresponds to the chosen number of points of the 
voltammogram taken for the analysis, equals n. The number of dependent variables equals 
unity in the cases discussed below. The number of samples in the training set is m. 

The original data consist o£a matrix of independent variables, X°(m,n), and a vector 
of dependent variables, c°(m). The upper index "O" denotes original (means not 
transformed). In the example discussed in Table 1, m equals 50 (duplicate runs for 25 
solutions). 

According to the formalism employed herein, a bolded capital letter denotes a 
matrix. Some matrices are described by two bolded letters, the first of them is capital. A 
bolded small case letter(s) denotes a vector. The superscript "T" and the subscript "-1" 
denote a transposed matrix/vector and an inverse matrix, respectively. The subscript "u" 
denotes an unknown sample(s). 

Data preprocessing 
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Preprocessing refers to the transformation of the original data in order to enhance the 
information representation. After the transformation, a variable is referred to as a feature to 
distinguish it from the original variable. 

The preprocessing method throughout these examples is autoscaling to unit variance 
[1,2], which refers to mean centering followed by dividing by the standard deviation, Sj, on a 
variable by variable basis: 

. X : : X ; 

x„- 0) 
where 

m 

x*=^— (2) 
m 

and 



Application of autoscaling transforms original variables X° and c° into features X 
and c, respectively. 

Another method of data preprocessing occasionally applied is mean centering 
described by the following equation: 

*i,j= x °- x i (4) 

If not otherwise stated, all features, both dependent (c) and independent (X), of the 
calculations presented below are assumed to be autoscaled to unit variance. Independent 
variables for prediction are transformed prior the calculations using scaling parameters of 
the training set. Predicted concentrations (dependent variables) are obtained via 
retransformation of predicted independent features using scaling parameters of the training 
set. 
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Traditional methods of calibration calculation 

Traditional methods of calibration calculation are based on univariate regression. 
Characteristic points to be regressed against concentrations in voltammetry are usually 
peak currents or peak charges (calculated by integration of peaks in time domain). Figure 1 
shows an example of a cyclic voltammogram recorded for CUBATH® ViaForm™ 
(Enthone) damascene copper plating bath. The first diffusion-controlled peak (range 1500- 
2000) corresponds to the copper ion reduction process. As potential reaches more negative 
values, the hydrogen ion reduction process (leading to the gaseous hydrogen evolution) 
starts to interfere with the copper ion reduction. The direction of potential change is 
reversed at point 2900 of the voltammogram. Starting from point 4355, one can observe the 
copper oxidation peak. In the CVS method this oxidation peak is considered to be 
correlated with the accelerator concentration in the plating bath. Least squares regression 
was applied in an attempt to correlate both oxidation peak height and peak surface (obtained 
via peak current integration) with the concentrations of all components present in the 
CUBATH® ViaForm™ plating bath. 

The data for calibration was obtained by running twice each of 25 solutions of 
composition corresponding to that in the Table 1. Both independent and dependent variables 
were autoscaled prior regression calculation. 

TABLE 2. Squared correlation coefficients for self prediction for concentrations of 
components of copper plating bath regressed by least-squares against integrated oxidation 
peak (columns 1,3) and oxidation peak height (columns 2,4), scan dq21b26, ch 2, range 
4355-5150 





Squared correlation coefficient (r 2 ) j 




1 


2 


3 


4 1 


| Component 


Full calibration 


Full calibration 


Limited calibration 


Limited calibration 1 




integrated peak 


peak height 


integrated peak 


peak height 


I Copper 


0.807 


0.653 
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1 Acid . 


0.0213 


0.0881 




| 


chloride 


0.109 


0.169 






1 accelerator 


0.0404 


0.0639 


0.885 


0.865 


I leveler 


2.81E-04 


7.60E-04 


0.0212 


0.0141 


I suppressor 


5.28E-05 


2.80E-03 


0.0244 


0.1059 



The squared regression coefficients of self prediction are presented in Table 2, 
columns 1 and 2. One can notice that only copper concentration can be somehow (although 
not satisfactorily according to standards discussed further) correlated with peak height and 
peak surface. In order to find a CVS correlation between accelerator concentration and 
copper oxidation peak height/surface a limited calibration was conducted varying the 
concentrations of organic additives only. The composition of 9 solutions used for the 
limited calibration matrix is presented in Table 3. 



TABLE 3. Composition of solutions for limited calibration (organic additives 
concentrations varied only) for copper plating bath calculated as 3-level-3-component-8-row 
linear orthogonal array plus the nominal solution (ninth row) 



Solution # 


Copper 
g/1 


Acid 
g/1 


Chloride 
ppm 


Accelerator 
ml/1 


Leveler 
ml/1 


Suppressor 1 
ml/1 


1 


17.5 


175 


50 


1 


0.5 


6 


2 


17.5 


175 


50 


2 


1.5 


5 


3 


17.5 


175 


50 


3 


2.5 


5 


4 


17.5 


175 


50 


1 


1.5 


7.5 


5 


17.5 


175 


50 


2 


2.5 


7.5 j 


6 


17.5 


175 


50 


3 


0.5 


7.5 j 


7 


17.5 


175 


50 


1 


2.5 


10 


8 


17.5 


175 


50 


2 


0.5 


10 


9 


17.5 


175 


50 


3 


1.5 


10 



-23- 



Concentrations of copper, acid and chloride were kept constant in all solutions and 
corresponding to the nominal values. The composition of the first eight solutions was 
calculated as a linear orthogonal array with two levels and three components (accelerator, 
leveler and suppressor). The ninth solution contains all components on their nominal level. 
The squared regression coefficients of self prediction are presented in Table 2, columns 3 
and 4. One can observe a correlation between accelerator concentration and oxidation peak 
height/surface. However, even in these conditions the value of squared correlation 
coefficient is lower than that obtained by much more sophisticated chemometric regression 
techniques. Based on the analysis of results presented above, one can conclude that it is 
impossible to apply any approach analogous to CVS for on-line accelerator analysis in the 
plating bath due to the influence of the variable concentrations of inorganic additives. The 
accelerator is the fastest depleting component and the constant monitoring of its 
concentration is essential for proper maintenance of the plating bath. 

Determination of the calibration range 

In order to determine what part of the chosen voltammogram is the most promising 
to be used for calibration of any given component, two independent procedures should be 
conducted for each j-th point of DC/AC voltammogram: 

- correlation calculation based on the least squares regression, 

- SIMCA (Simple Modeling of Class Analogy) based calculation of modeling power [1]. 

The first method provides information on what range of the voltammogram shows 
the greatest correlation with the concentration of the component to be calibrated. It also 
determines the range where AC or DC current responses depend only on changes of 
concentration of the component of interest. Therefore each component requires its own 
specific range to be found. The other method gives information about signal to noise ratio 
for each point within the chosen range. 
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The optimal range to be chosen for calibration of a given component should have a 
good correlation, be possibly independent from concentration changes of constituents other 
than calibrated one, and have a high signal to noise ratio. 



The algorithm for the correlation calculation based on the least squares regression is 
as follows: 

- Both, independent and dependent variables are autoscaled. 

- Regression is calculated for each point of the voltammogram. For j-th point of the 
scaled voltammogram (called also feature j) one can write the following regression 
equation: 

c - Xjbj (5) 
where regression coefficients is calculated via equation: 

m 

IX c i 

b i=-^r— ( 6 ) 
2X 



- Based on the regression coefficients, self-prediction is calculated for each point of the 
scaled voltammogram. 

- The squared correlation coefficients, r 2 , are calculated for each j-th point of the scaled 
voltammogram: 







m 




fm 


m 




2 










J 




lA 


> 








>i 






i=l y 










r 

m 




2 




m 






71 


< 






/ 


mj< 












i-1 


Vi=l J 










J 





(7) 



where c is the predicted scaled concentration. 
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- The range corresponding to high values of r 2 (possibly close to unity) is picked up as a 
calibration range containing m points. 



The SIMCA-based procedure for calculating the modeling power of the j-th point of 
scaled voltammogram (feature j) is as follows: 

- The autoscaled training set matrix X(m,n) is decomposed by PCA to principal 
components, S(m,a), and eigenvectors, V(n,a). The number of factors, a, is determined 
by cross validation (in the examples discussed later in the text the optimal number of 
factors usually equals 3 or 4). 

- The matrix of residuals for the training set is calculated from the expression: 
E = X-SV T (8) 

- For each j-th point of the scaled voltammogram the residual variance of feature j, 

rv j (error) , is computed from the following equation: 

2 

rv' (error) = £ ,J (9) 
where e is the element of the matrix E. 

- For each j-th point of the scaled voltammogram the meaningful variance in feature j, 
rvj (x), is given by: 

mm-1) 

- The modeling power of feature j, Rj, is defined to be: 

rv , (error) 
rvj(x) 

As Rj approaches unity, the feature is highly relevant; conversely, at it approaches 
zero, the feature approaches zero utility in the model. 
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Figure 2 presents an application example of both the squared correlation coefficient, 
Tj 2 , (obtained via least squares regression) and the modeling power, Rj, as a criteria for 

determining the optimal calibration range of voltammogram. Figure 2 is based on the 
analysis of CV voltammograms of tin redox electrode reactions in the tin/lead plating bath. 
Squared correlation coefficients, r? , correlate the electroanalytical response with actual tin 
concentration in training set solutions for each, j-th point of the voltammogram. One can 
notice that neither r? , nor Rj is a sufficient criterion to determine whether a given feature j is 

an optimal one to be included into the calibration range. Only features j for which both rj 2 , 
and Rj values are relatively high can be taken into calibration (like range 55-210 in Figure 
2). Therefore the analysis of combined parameter R -r^ is helpful for determining the 
optimal calibration range as demonstrated in Figure 2. It should be mentioned that the 
analysis of the R -x* parameter provides only an estimated calibration range. The optimal 
calibration range should finally be determined via cross validation methods by also checking 
empirically ranges slightly wider than that suggested by R .r? . However, the optimal range 

should contain, in most cases, the whole range corresponding to significant values of R .r? 

parameter. It should be mentioned that the calibration range might be extended to include 
only points still having high modeling power value. 

Outlier detection within the training set prior to regression calculation 

The next step of the analysis is the exaniination of the training set in order to 
determine and eliminate possible outliers prior to calculation of regression. The Principal 
Component Analysis (PCA) [3,4] method is applied to decompose matrix X(m,n) into 
matrices being outer products of vectors called scores (S(m,a)) and loadings (V(n,a)). Four 
different methods were used to decompose the data matrix X. The first two methods, 
nonlinear iterative partial least squares (NEPALS) [2, 5] and successive average 
orthogonalization (SAO) [6], were pair-by-pair methods while Jacobi transformation [7,8] 
methods calculated all the principal components at once using the variance-covariance 
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matrix. The results of all methods were practically identical. The PC A calculations were 
done in MS Visual Basic (VB) and were compared to results obtained with Matlab Singular 
Value Decomposition technique to reach full agreement. All computations discussed below 
connected with outlier detection were done in VB and in Matlab mostly in order to verify 
their correctness. In the case of VB programs the NIPALS method was chosen as optimal 
(based mostly on the time factor) for X matrix decomposition. 

In order to determine outliers in the training set the Mahalanobis distance (MD) 
coupled with PCA (MD/PC A) was applied. One of the main reasons the Mahalanobis 
distance was chosen is that it is very sensitive to inter-variable changes in the training set 
data. In addition, the distance is measured in terms of the standard deviation from the mean 
of the training samples. The difference between the classical Mahalanobis distance and 
Mahalanobis distance coupled with PCA methods is that in the latter S replaces X from the 
former in the analysis. Prior to the calculation of Mahalanobis distance it is necessary to 
calculate the Mahalanobis matrix (M) based on the scores of the whole training set: 



The square of the Mahalanobis distance corresponding to i-th sample in the training set is 
calculated from the following equation: 



Samples having significantly larger values of D are eliminated from the training set as 
outliers. The remaining data is used to calculate the calibration. 

A more reliable approach for elimination of outliers from the training set is the 
Mahalanobis distance based on the cross validation. In this method one checks the part of 
the training set based on the criterion of best predictive ability, as opposed to best fit (like 
the self prediction method presented above). The iterative procedure for cross validation 
using Mahahalobis distance method coupled with PCA is presented below: 
- Set the value of index k=l. 



M = S T S/(m-l) 



(12) 



Df =s i M" 1 s- F 



(13) 
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- Extract k-th vector x° (n) from data matrix X°(m,n). The remaining matrix is called 
X° (m - 1, n) and plays a role of the training set matrix in the k-th step. 

- Matrix X° is to be autoscaled to unit variance to obtain X k . 

- The vector x° is scaled using scaling parameters of matrix X° to obtain Xk. 

- The matrix Xk is decomposed for scores S k (m-l,a) and eigenvectors V k (n,a) using 
number of factors of a. 

- The Mahalanobis matrix is calculated by applying the following dependence: 
M k =sfs k /(m-2) (14) 

- Scores are calculated for the vector Xk using the equation: 

Sk = x k V k (15) 

- The square of the Mahalanobis distance corresponding to k-th sample is calculated from 
the following equation: 

Dj^SkMk'sI (16) 

- If k is less than m then increment k by one and return to the second step of this 
procedure. 

Another method based on the Mahalanobis distance by principal component analysis 
employs not only scores but also residuals. The algorithm for the method called 
Mahalanobis distance by principal component analysis with residuals (MD/PCA/R) [9] for 
cross-validation is presented below: 

- Set the value of index k=l . 

- Extract k-th vector x£ (n) from data matrix X°(m,n). The remaining matrix is called 
X° (m - 1, n) and plays a role of the training set matrix in the k-th step. 

-. Matrix X° is to be autoscaled to unit variance to obtain X k . 

- The vector x° is scaled using scaling parameters of matrix X° to obtain x k . 

- The matrix X k is decomposed for scores Sk(m-l,a) and eigenvectors V k (n,a) using 
number of factors of a. 



- The matrix of residuals for k-th training set matrix, E k (m-l,n), is calculated via 
following equation: 

E k =X k -S k V k T (17) 

- The column vector of the squared sums of residuals, called also Q residuals, for the k-th 
training set, rsk(m-l), is computed employing following dependence: 

«k=2>ij) 2 (18) 

where ey is the element of the matrix E k . 

- The column vector rs k is being added as the a+l st column to the matrix of scores S k (m- 
l,a). This creates a residual augmented scores matrix, T k (m-l,a+l). 

- The Mahalanobis matrix is calculated from the residual augmented scores by applying 
the following dependence: 

Mr k =T k r T k /(m-2) (19) 

- Scores, s k , for the vector x k are calculated using Equation 15. 

- The predicted row vector of residuals, ep k (n), for vector x k is calculated using the 
following equation: 

ep k =x k (I-V k V k T ) (20) 

where I(n,n) is an identity matrix. The identity matrix is always square and contains ones 

on the diagonal and zeros everywhere else. 

- The predicted residual sum of squares, rp k , for x k vector is computed employing the 
expression: 

n 

*Pk = X e Pk,j (21) 
j=l 

- The scalar rp k is being appended as the a+l st value to the row vector s k (a). This creates 
a residual augmented scores vector, t k (a+l). 

- The value of square Mahalanobis distance is predicted for the unknown sample by 
applying the following expression: 

Difc 2 =t k MrtftJ (22) 
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- If index k is less than m then increment k by one and return to the second step of this 
procedure. 

TABLE 4. MD/PCA self prediction and cross-validation, and MD/PCA/R cross-validation 
calculated for s4 scan, range 200-250, and channels 4 and 5. Columns 1, 2, 3, 7, 8 and 9 are 
computed for the whole training set (prior removal of outliers). Columns 4, 5, 6, 10, 1 1 and 
12 are calculated for the training set after removal of outliers. 



# 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 




selfpr 




XV 


selfpr 




XV 


selfpr 




XV 


selfpr 


X 


XV 




ed 


Xval 


Res 


ed 


Xval 


Res 


ed 


Xval 


Res 


ed 


val 


Res 


1 


3.13 


3.53 


3.83 


3.29 


3.77 


3.86 


2.98 


3.33 


3.34 


3.27 


3.75 


3.81 


2 


2.68 


2.93 


2.97 


2.70 


2.97 


3.03 


2.90 


3.22 


3.39 


2.89 


3.22 


3.23 


3 


2.44 


2.62 


2.66 


2.49 


2.70 


2.75 


2.94 


3.24 


3.46 


3.13 


3.54 


3.54 


4 


3.30 


3.77 


3.85 


3.33 


3.84 


3.94 


2.80 


3.03 


3.67 


3.03 


3.39 


4.03 


5 


3.33 


3.82 


3.89 


3.38 


3.91 


4.17 


2.59 


2.69 


6.56 


3.09 


3.48 


5.52 


6 


2.87 


3.17 


3.37 


2.86 


3.17 


3.49 


2.22 


2.33 


3.41 


2.86 


3.17 


3.24 


7 


2.40 


2.58 


3.50 


2.72 


2.98 


3.16 


2.24 


2.39 


2.43 


2.75 


3.02 


3.02 


8 


2.49 


2.69 


2.71 


2.55 


2.77 


2.80 


2.47 


2.66 


2.77 


2.48 


2.68 


3.40 


9 


2.79 


3.08 


3.37 


2.77 


3.05 


3.74 


2.65 


2.88 


2.93 


2.64 


2.89 


3.75 


10 


2.87 


3.18 


3.20 


2.84 


3.15 


3.18 


2.74 


3.00 


3.02 


2.93 


3.27 


3.27 


11 


2.96 


3.29 


3.31 


3.02 


3.39 


3.60 


2.68 


2.92 


3.07 


3.21 


3.65 


4.22 


12 


3.08 


3.46 


'3.57 


3.10 


3.50 


3.60 


2.80 


3.08 


4.03 


3.29 


3.78 


3.80 


13 


1.08 


1.11 


1.68 


1.57 


1.63 


1.66 


1.94 


2.01 


2.67 


2.00 


2.11 


2.29 


14 


0.93 


0.95 


4.93 


1.83 


1.90 


3.61 


2.25 


2.38 


2.81 


2.26 


2.41 


2.65 


15 


0.85 


0.87 


2.77 


1.71 


1.77 


2.11 


2.17 


2.29 


2.58 


2.16 


2.30 


2.37 


16 


0.89 


0.90 


1.07 


1.99 


2.09 


2.14 


1.31 


1.35 


1.47 


1.34 


1.38 


1.65 


17 


1.09 


111 


1.16 


2.44 


2.63 


2.66 


1.51 


1.56 


1.68 


1.51 


L56 


2.66 


18 


1.80 


1.87 


3.81 


1.97 


2.06 


3.78 


1.66 


1.71 


2.07 


2.00 


2.11 


2.20 


19 


1.32 


1.36 


1.36 


1.30 


1.34 


1.35 


0.84 


0.85 


1.39 


1.34 


1.38 


1.43 


20 


1.47 


1.52 


1.52 


1.78 


1.86 


2.14 


0.94 


0.96 


1.45 


0.99 


1.01 


3.11 
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21 


1.55 


1.61 


1.61 


1,69 


1.76 


1.86 


0.98 


1.00 


1.60 


1.82 


1.90 


1.96 


22 


1.37 


1.41 


2.14 


1.34 


1.38 


3.06 


1.36 


1.37 


1.86 


1.46 


1.49 


4.45 


23 


1.53 


1.59 


2.15 


2.45 


2.64 


2.84 


1.86 


1.86 


2.66 


2.55 


2.77 


2:79 


24 


1.74 


1.81 


1.87 


2.11 


2.24 


2.30 


1.53 


1.54 


2.09 


2.19 


2.33 


2.35 


25 


1.16 


1.19 


1.20 


1.20 


1.23 


1.23 


1.09 


1.11 


1.62 


1.27 


1.30 


1.32 


26 


1.09 


1.12 


1.12 


1.09 


1.11 


1.11 


0.99 


1.01 


1.41 


1.08 


1.11 


1.19 


27 


0.85 


0.87 


0.88 


0.83 


0.85 


0.87 


1.10 


1.13 


1.40 


1.10 


1.13 


1.38 


28 


0.65 


0.66 


0.67 


0.65 


0.65 


0.68 


0.68 


0.69 


0.81 


0.66 


0.67 


0.73 


29 


0.79 


0.80 


1.14 


0.78 


0.79 


1.57 


0.65 


0.66 


0.78 


0.68 


0.69 


0.80 


30 


0.69 


0.70 


0.70 


1.01 


1.03 


1.07 


0.65 


0.66 


0.77 


0.76 


0.77 


1.36 


31 


0.63 


0.64 


0.69 


0.83 


0.84 


0.92 


0.63 


0.63 


0.72 


0.76 


0.77 


0.81 


32 


0.61 


0.61 


0.97 


1.60 


1.66 


1.77 


0.90 


0.91 


1.06 


1.36 


1.40 


1.46 


33 


1.03 


1.05 


1.06 


1.35 


1.39 


1.46 


0.65 


0.66 


1.00 


1.21 


1.24 


1.28 


34 


0.76 


0.78 


1.74 


1.50 


1.55 


2:08 


1.36 


1.37 


1.60 


1.70 


1.77 1 


1.79 


35 


1.41 


1.46 


1.46 


1.93" 


2.03 


2.26 


1.15 


1.15 


1.60 


1.82 


1.91 


1.94 


36 


1.16 


1.19 


1.40 


1.18 


1.20 


2.00 


1.03 


1.04 


1.27 


1.09 


1.11 


2.97 


37 


1.08 


1.10 


1.32 


1.14 


1.17 


1.86 


1.02 


1.04 


1.51 


1.06 


1.08 


1.58 


38 


1.47 


1.52 


1.53 


1.45 


1.50 


1.50 


1.10 


1.12 


1.84 


1.33 


1.37 


1.43 


3? 


1.95 


2.05 


2.67 


2.31 


2.47 


5.61 


1.13 


1.16 


2.33 


2.18 


2.32 


2.59 


40 


1.11 


1.13 


1.15 


1.26 


1.30 


1.37 


0.99 


1.00 


1.26 


1.02 


1.05 


1.17 


41 


1.12 


1.15 


1.17 


1.35 


1.39 


1.44 


1.00 


1.02 


1.18 


1.00 


1.02 


1.14 


42 


0.91 


0.92 


1.18 


1.46 


1.51 


2.03 


1.25 


1.28 


1.34 


1.22 


1.25 


1.42 


43 


0.85 


0.86 


0.88 


1.23 


1.26 


1.41 


1.09 


1.11 


1.15 


1.06 


1.09 


1.14 


44 


0.96 


0.98 


1.17 


1.40 


1.44 


1.54 


1.49 


1.54 


1.54 


1.48 


1.53 


1.54 


A< 
4j 


O-O / 




349.8 
4 








0.74 




5560. 
16 








46 


1.66 


1.72 


2.63 


2.00 


2.11 


2.73 


2.30 


2.44 


2.47 


2.38 


2.56 


2.56 


47 


1.17 


1.20 


1.21 


1.25 


1.28 


1.31 


1.73 


1.80 


1.85 


1.82 


1.90 


1.91 


48 


1.37 


1.41 


1.49 


1.78 


1.86 


1.95 


1.97 


2.06 


2.09 


1.98 


2.08 


2.09 
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49 


0.94 


0.96 


0.97 


0.94 


0.96 


0.98 


0.90 


0.92 


1.09 


1.06 


1.09 


1.11 


50 


1.27 


1.31 


1.49 


2.03 


2.14 


2.49 


0.94 


0.96 


1.39 


1.57 


1.62 


2.02 


51 


0.91 


0.93 


1.16 


0.89 


0.90 


1.39 


0.86 


0.88 


1.14 


0.85 


0.87 


1.60 



MD/PCA self-prediction and cross-validation, and MD/PCA/R cross-validation calculated 
for two data sets are presented in Table 4. Both data sets were obtained for the same 
training set of solutions. However, they differ from each other in the manner by which some 
experimental parameters were obtained. As expected, MD/PCA/R values for cross- 
validation are slightly larger than that for cross-validation, which are larger than that for 
self-prediction (Table 4, columns 1,2,3 and 7,8,9). However, the sensitivity of outlier 
detection performance is definitely the largest for MD/PCA/R as demonstrated by the 
example of sample #45. After removing of the outliers from the training set, the self- 
prediction and cross-validation MD/PCA and MD/PCA/R were recalculated and presented 
in columns 4, 5, 6 and 10, 1 1, 12 respectively. 

Another powerful method for outlier detection is called SIMCA [1]. In order to 
check whether the whole training set consists of one class (in other words, whether there are 
no outliers within the training set) cross validation can be applied. The algorithm for 
SIMCA cross validation is following: 

- Set the value of index k=l. 

- Extract k-th vector x°(n) from data matrix X°(m,n). The remaining matrix is called 
X° (m - 1, n) and plays a role of the training set matrix in the k-th step. 

- Matrix X° is to be autoscaled to unit variance to obtain X k . 

- The vector x£ is scaled using scaling parameters of matrix X° to obtain Xk. 

- The matrix X k is decomposed for scores S k (m-l,a) and eigenvectors V k (n,a) using 
number of factors of a. 

- The matrix of residuals for k-th training set matrix, E k (m-l,n), is calculated via 
Equation 17. 

- The residual variance for the k-th training set X k is calculated from the equation: 
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m-1 n 



4 



(23) 



^^(m-a-2)(n-a) 

- The row vector of predicted residuals for vector Xk, epk, is calculated using Equation 20. 

- The predicted residual variance for vector Xk normalized with respect to rvj; k is 
computed using the following expression: 

e Pkj 



rv 



-Z: 



(24) 



H(n-a)rv^ k 

- If index k is less than m then increment k by one and return to the second step of this 
procedure. 



For the sake of comparison of performance between MD-based methods and SDVtCA cross- 
validation the same experimental data from Table 4 was used for SIMCA calculations 
presented in Table 5. 



TABLE 5. Predicted residual variances (Equation 24) normalized with respect to residual 
variance for training subsets (Equation 23) for s4 scan, range 200-250, and channels 4 and 5. 
Columns 1 and 2 are computed for the whole training set. Columns 3 and 4 are calculated 
for the training set after outlier removal. 



# 


1 


2 


3 


4 


1 


1.74 


1.82 


1.32 


0.51 


2 


0.86 


0.23 


0.98 


0.26 


3 


0.91 


1.91 


0.97 


0.18 


4 


0.62 


5.29 


0.58 


4.62 


5 


0.51 


9.73 


1.18 


7.44 


6 


0.90 


4.99 


1.30 


0.78 


7 


2.53 


1.51 


1.57 


1.35 


8 


0.19 


0.10 


0.71 


3.36 


9 


1.57 


1.48 


2.53 


3.36 


10 


0.43 


1.11 


0.56 


0.31 
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11 


0.42 


1.82 


1.10 


2.32 


12 


1.05 


3.58 


0.93 


0.38 


13 


J.70 


0.97 


0.89 


1.00 


14 


6.21 


0.59 


4.29 


1.21 


15 


3.69 


0.31 


2.03 


0.54 


16 


0.98 


0.03 


0.34 


1.10 


17 


0.73 


0.12 


0.60 


2.75 


18 


4.61 


0.83 


4.16 


0.40 


19 


0.25 


1.19 


0.36 


0.20 


20 


0.48 


1.34 


1.74 


3.86 


21 


0.36 


1.69 


0.58 


0.28 


22 


2.32 


2.00 


3.30 


5.32 


23 


2.10 


2.88 


0.64 


0.24 


24 


0.89 


2.30 


0.30 


0.16 


25 


0.16 


0.64 


0.21 


0.23 


26 


0.10 


0.46 


0.16 


0.44 


27 


0.21 


0.24 


0.27 


0.83 


28 


0.16 


0.03 


0.25 


0.18 


29 


1.24 


0.04 


1.72 


0.26 


30 


0.26 


0.04 


0.70 


1.47 


31 


0.46 


0.28 


0.34 


0.12 


32 


1.08 


0.69 


0.30 


0.22 


33 


0.42 


0.91 


0.34 


0.13 


34 


2.16 


1.31 


1.25 


0.22 


35 


0.42 


1.62 


0.80 


0.13 


36 


1.21 


1.16 


2.12 


3.67 


37 


0.95 


0.56 


1.62 


1.33 


38 


0.13 


0.94 


0.17 


0.44 


39 


2.11 


1.63 


5.99 


1.74 
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40 


0.27 


0.39 


0.36 


0.56 


41 


0.25 


0.24 


0.20 


6.50 


42 


0.96 


0.16 


1.26 


0.71 


43 


0.25 


0.04 


0.47 


0.28 


44 


0.81 


0.04 


0.33 


6.12 


45 


411.12 


7442.55 






46 


2.42 


0.43 


1.63 


0.36 


47 


0.13 


0.35 


0.21 


0.21 


48 


0.57 


0.30 


0.29 


0.26 


49 


0.26 


0.23 


0.35 


0.22 


50 


0.97 


0.56 


2.21 


1.71 


51 


0.93 


0.24 


1.31 


1.53 



Based on our experience, the percentage of outliers in the training set is not larger 
than 5% for systems, setups and voltammetric methods worked with. A relatively low 
number of outliers in the training set is connected with very stable conditions (including 
fully controlled composition of solutions for calibration) the calibration is performed in. 
Also the waveform of applied voltammograms are chosen to be as possibly reproducible and 
stable as possible. 

A relatively low number of outliers in the training set allows us to assume that 
Mahalanobis distance and SMCA methods are reliable in our conditions. The disadvantage 
of MD method, which fortunately was not encountered, is producing of inaccurate results if 
there are multiple outliers (usually several tens of percent of the training set) in the data. 
Methods for dealing with multiple outliers are: MCD (minimum covariance determinant) 
[10], RHM (resampling by half-means) [11] and (SHV (smallest half-volume) [11]. These 
methods require determining the maximal percentage of outliers in the training set. Based 
on this information the best training subset is selected and used for calibration. In that paper 
[11], the authors suggest removal of up to 50 % of original training set. Such a treatment 
would lead, in our case, to the uncompensated loss of good calibration data containing 
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mostly files corresponding to concentrations close to the lower and upper limits. This would 
narrow the concentration range of the training set and impede predicting the performance of 
the regression equation. 



Determination of the optimal number of factors for calibration 

One of the most effective methods that can be used to aid in determining the optimal 
number of factors for calibration is called PRESS (Prediction residual error sum of squares) 
[1,4,12]. This method is based on the calculation of concentration residuals for different 
numbers of factors. The self-predicted and/or cross-validated concentrations are obtained 
using both principal component regression (PCR) [12,13,14] and partial least-squares (PLS- 
1) [1,2,12,13,14,15] regression. Both regression methods are commonly used and their 
algorithms are described in the literature in great detail. 

If the number of dependent features equals unity then the expression for PRESS is 
following: 

m 

PRESS = £(ecp) 2 (25) 
i=l 

where ecp is the concentration residual of the i-th sample calculated for its original (not 
autoscaled) actual concentration and the retransformed (rescaled) self-predicted/cross 
validated concentration via the following dependence: 

ec p = c?-cp (26) 

where cP denotes retransformed (rescaled) concentration predicted via selfprediction/cross 
validation. 

Figures 3a and 3b present the calculated values of self-predicted PRESS using PCR 
and PLS-1 for brightener and carrier, respectively. The self-predicted PRESS is the simplest 
and fastest method for testing a calibration model. The problem with this approach is that 
the model vectors are calculated from these same voltammograms. Therefore, all the 
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vectors calculated exist in all the training voltammograms. This was not very problematic in 

the case of Mahalanobis distance calculations, but here the PRESS plot will continue to fall 

as new factors are added to the model and will never rise. It is possible to select the number 

of factors as the place where the plot starts to "flatten out". One can notice that plots in 

Figure 3 a and 3b (for PCR) start to "flatten out" at a factor number of four. However, this is 

an inexact measure, and gives no indication of the true optimum number of factors for the 

model when predicting unknown samples. One can obtain much more reliable data while 

using cross-validation PRESS for all the samples in the training set. Each sample, in turn, is 

omitted from the training set and a model is calculated with the remaining samples. This 

model predicts the concentration for the omitted sample. The squared error between 

predicted and actual values is calculated to form a single PRESS value. The sample is then 

returned to the training set, the next sample is omitted, and the cycle repeated to calculate 

another PRESS value. The procedure is repeated until all the samples have been treated. 

The PRESS values are summed and this constitutes the PRESS value for the model. The 

values of PRESS calculated using cross validation PCR and PLS-1 for brightener and carrier 

are displayed in Figures 4a and 4b, respectively. In Figures 4a and 4b, one can notice that 

from 1 to 4 factors the prediction error (PRESS) decreases as each new factor is added to the 

model. This indicates that the model is underfit and there are not enough factors to account 

completely for the constituents of interest. On both Figures 4a and 4b the PRESS plots 

reach a minimum and start to ascend again (corresponding to number of factors of 4). At 

this point the model is beginning to add factors that contain uncorrelated noise which is not 

related to the constituents of interest. The factor number corresponding to the minimum 

value of PRESS indicates the optimal number of factors. Although cross-validation PRESS 

calculations are much more time consuming that self-prediction PRESS calculations the 

former are recommended for optimal factor number determination. In order to determine 

the number of factors corresponding to the local minimum of PRESS calculated for the 

smallest possible number of factors, the R ratio of successive prediction sum of squares is 

being employed [1,4]: 

PRESSOR 
PRESS(a) 
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Starting with number of factors, a = 1, if the R is less than one, then the increased factor 
space yields better predictions; hence the procedure is repeated with j ^ 2, etc. until the ratio 
is greater than one, indicating that the added factor does not improve the predictions. R ratio 
calculated for the data of both Figures 4a and 4b indicates the optimal number of factors of a 
= 4 for both regression methods, PCR and PLS-1. 

The F statistic, based on PRESS [15], can also be used to aid in the comparison of 

the prediction abilities of the two different calibration methods, PCR and PLS-1. Let us 

define the F-ratio for two different calibration methods as: 

P = PRESS(PCR) 
press p RESS(PLS ) 

To illustrate the performance of the Fpress parameter, the data from Figures 4a and 
4b were recalculated and presented in Figure 5. For the optimal number of factors, a, one 
should expect similar performance of PCR and PLS-1, which means the F PRES s ratio should 
be close to unity. One can notice that in Figure 5 F PRE ss is closest to one for brightener for a 
= 4, which confirms the conclusion based on R-ratio analysis. However, the data for carrier 
in Figure 5 does not provide us a conclusive answer as the F PRE ss ratio is close to unity both 
for a = 3 and 4. 

The other method that can be helpful for determining the optimal number of factors 
for calibration is based on the Exner psi (\|/) function [4,16,17] given by: 

f \ V1 



y(a) = 



Z(cp-£?(a)) 



i=l 



m 



E 

i=l 



in 

££?(*) 



O i=l 



m 



m-a 



(29) 
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The values of the Exner \\j function calculated for the same concentration data as used for 
PRESS calculation in Figures 4a and 4b are presented in Figures 6a and 6b, respectively. 
One can easily notice that the Exner i|/ function curves in Figures 6a and 6b are qualitatively 
analogous to those of PRESS in Figures 4a and 4b, respectively. In a manner similar to 
PRESS, the local minimum of the Exner \\j function corresponding to the smallest possible 
number of components indicates the optimal number of factors. Additionally, the analysis 
of the absolute value of the Exner y function provides information about the accuracy of the 
calibration model. A value of the Exner \j/ function equal to 1.0 is the upper limit of 
physical significance, because this means that one has not done any better than simply 
guessing that each point has the same value of the grand mean of the experimental data. 
Exner [17] proposed that 0.5 should be considered the largest acceptable \\f value, because it 
means that the fit is twice as good as guessing the grand mean for each point. 

Outlier detection within the training set by regression calculation 

Apart from the Mahalanobis distance and SIMCA methods described above there are 
other powerful tools for outlier detection: F-ratio method based on concentration residuals, 
F c -ratio, and plot of Studentized concentration residuals versus leverages. However, in 
contrast to Mahalanobis distance and SIMCA methods, these employ regression 
calculations. When the optimum number of factors for the model has been determined, the 
concentration residuals are calculated using Equation 26. In the F°-ratio method for cross- 
validation, the training sample square residual is expressed with respect to the rest of the 
training set by following equation [15]: 



F; c = 



2>?> 2 



(30) 



Another useful tool for identifying outliers within the training set is a plot of the 
internally or externally Studentized concentration residuals versus the leverage value for 
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each sample [18]. The leverage value gives a measure of how important an individual 
training sample is to the overall model. The Studentized residual give an indication of how 
well the sample's predicted concentration is in line with the leverage. Both, leverages and 
Studentized residuals can be calculated by means of self-prediction or cross-validation. The 
approach based on cross-validation has a higher resolution than that for self-prediction and 
therefore has our preference. The algorithm presented below calculates cross-validated 
leverages: 

- Set the value of index k=l . 

- Extract k-th vector x° (n) from data matrix X°(m,n). The remaining matrix is called 
X° (m - 1, n) and plays a role of the training set matrix in the k-th step. 

- Matrix X° is to be autoscaled to unit variance to obtain X k . 

- The vector x° is scaled using scaling parameters of matrix X° to obtain x k - 

- The matrix Xk is decomposed for scores S k (m-l,a) and eigenvectors V k (n,a) using 
number of factors of a. 

- Scores are calculated for the vector x k using Equation 15. 

- The vector s k is appended as the k-th row into the matrix, S, of scores predicted by 
cross validation. 

- If index k is less than m then k is incremented by one and returned to the second step of 
this procedure. 

- The matrix of scores, S, is being used to calculate the square "hat" matrix, H(m,m) 
according to the equation: 

H = S(S T S) _1 S T (31) 

The diagonal elements of the "hat" matrix, h k>k , constitute leverages. The k-th leverage 
corresponds to k-th sample of the training set. 

The procedure for internally and externally Studentized concentration residuals starts with 
the calculation of the column vector of concentration residuals, ec° (Equation 26). The 
predicted concentrations for residuals are calculated by PCR or PLS-1 cross-validation for 
the number of factors of a. The number of factors must be the same as that for the "hat" 
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matrix. The internally Studentized residual for the k-th sample of the training set is 
computed employing following dependence [18]: 

\- - m (32) 

s(l-h k , k ) ,/2 

where s is the residual mean, whose square is defined by the equation: 

s> J"°^° (33, 
m-a 

The externally Studentized residual for k-th sample of the training set is calculated using the 
following equation [18]: 

\ = ^ xl2 (34) 
s(k)(l-h k>k ) 1/2 

where s (k) is defined by the expression: 
- (m-a)s 2 -(ecP) 2 /(l-h kk ) 

s 2 (k) = ^_ V — V KkJ (35) 

m -a-1 

An example plot of externally Studentized concentration residuals versus leverages 
calculated by cross-validation for the training set is shown in Figure 7 (ba2,ch5,300-860). 
There are three obvious outliers shown in Figure 7. However, for one of them only the 
value of externally Studentized concentration residual (-4.74) is outlying, while its leverage 
value (0.0153) is within the training set. In contrast, the other sample has the highest 
outlying value of leverage (0.513) while the externally Studentized concentration residual 
exceeds the training set cluster values only slightly. The third outlier is determined by both 
leverage (0.421) and externally Studentized concentration residual (3.59). Based on the 
above, one can conclude that only coupling of leverages and Studentized concentration 
residuals gives a reliable approach for outlier detection within the training set. 

Calibration calculation 

It is recommended to perform calculations aiming to obtain the optimal number of 
factors (by PRESS and/or Exner v|/ function) and eliminating outliers by regression 
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calculation from the training set (methods based on concentration residuals: F-ratio and 
Studentized concentration residuals versus leverages plot) in an iterative sequence. Iteration 
should stop when the optimal number of factors is calculated and there are no outliers in the 
training set. 



Having determined the correct number of factors and the outlier-free training set, one 
can perform the final regression calculation using PLS-1 or PCR method. 
As an example calibration, the acid calibration in the five-component (copper, 14-24 g/L; 
acid, 140-220 g/L; chloride, 30-80 ppm; brightener, 2-9 mL/L; carrier, 3-8 mL/L) PC 75 
copper plating bath (Technic, Inc.) is presented below. The calibration was performed based 
on a 25-solutions matrix analogous to that of Table 1 but having five components instead of 
six. The scan chosen for the calibration was b26, channel 3 (see Figure 8) for the range of 
4000-4800 with the optimal number of factors of 3. It is clearly demonstrated that PCR and 
PLS-1 methods are capable of creating an accurate calibration model from the range of the 
scan that contains no characteristic or significant points for univariate regression. The 
calibration is examined based on values of squared regression coefficients calculated from 
original actual concentrations and rescaled predicted concentrations: 
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m 
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i=l 
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(36) 



and PRESS (Equation 25) for self prediction and cross validation for both regression 
methods, PCR and PLS-1 (Table 6). Both regression methods, PCR and PLS-1, perform 
very similarly which is also apparent in Figure 9 presenting actual and cross validated acid 
concentrations 
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TABLE 6. Squared correlation coefficients, (r°) 2 , (Equation 36) and PRESS Equation 25) 
calculated by PCR and PLS-1 as self prediction and cross validation for acid calibration for 
PC75 plating bath (scan dq21b26, ch 3, range 4000-4800, 3 factors) 





PCR 


PLS-1 


selfpred 


Xval 


selfpred 


Xval 




0.9769 


0.9724 


0.9771 


0.9729 


PRESS 


925.8 


1103.2 


914 


1082.8 



The level of accuracy presented in Table 6 and Figure 9 is more than satisfactory for 
our purposes as it consists a small fraction of the acid concentration range for PC 75 
(Technic, Inc.) copper plating bath. Usually, a plating bath is designed to perform 
satisfactorily as long as the concentrations of all bath constituents are maintained within 
certain ranges that define calibration ranges. 

A very important advantage of multivariate regression methods in comparison to 
univariate regression methods is the ability of the multivariate techniques to utilize 
simultaneously the information coming from different sources. This collective information 
can be used as a base for a calibration producing a more accurate and less biased model than 
multivariate calibrations but based on the data coming from single source. In order to 
generate an example collective data set, portions of two different voltammograms (bath 
PC75, ba2, ch 3, 401-701 and ch 4, 301-601) were "glued" together and regressed against 
brightener concentration. The modeling power corresponding to these ranges was 
satisfactorily high. This calibration is based on the same training set as was used for the 
previous example of acid calibration in a PC75 bath. Both scans used for brightener 
calibration do not present much value for the univariate regression as they do not contain 
any characteristic single points (like peaks etc.). Additionally, the least squares regression 
calculation conducted for each j-th point of autoscaled AC voltammograms (procedure steps 
1.1-1.5) does not produce regression coefficients satisfactory for purposes discussed in this 
text (Figure 10). 
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TABLE 7. Squared correlation coefficients, (r 0 ) 2 (Equation 36) calculated by PCR and 
PLS-1 as self prediction and cross validation for number of factors 2 and 3 for brightener 
calibration for scan dq21ba2, channels: 3 (range 401-701), 4 (range 301-601) and "glued" 
data for channels 3 (range 401-701) and 4 (range 301-601) 



#of 


PCR 












PLS-1 


factors 


ch3 


ch3 


ch4 


ch4 


ch3+ch4 


ch3+ch4 


ch3 


ch3 


ch4 


ch4 




selfpred 


Xval 


selfpred 


Xval 


selfpred 


Xval 


selfpred 


Xval 


selfpred 


Xval 


2 


0.7098 


0.6903 


0.9221 


0.9175 


0.9406 


0.9372 


0.7200 


0.7009 


0.9229 


0.91! 


3 


0.7536 


0.7243 


0.9317 


0.9250 


0.9407 


0.9360 


0.9123 


0.8985 


0.9371 


0.93: 



Table 7 shows squared regression coefficients for brightener calibration calculated 
by employing Equation 36 for channel 3 only, channel 4 only and "glued" data for channels 
3 and 4. One can notice that the "glued" data set produces higher (r°) 2 's for both self 
prediction and cross validation for PCR and PLS-1 . One can also notice that the range 
chosen for brightener calibration from ba2, ch4, 301-601 partially corresponds to the very 
low values of r 2 calculated by LSR. However, as was checked by cross validation, such an 
empirically extended range gives higher (r°) 2 for PCR- and PLS-1- based regression than the 
narrower range determined purely using the R -rf parameter. 

Comparing data from Table 6 to that of Figure 10, one can also notice that the 
squared regression coefficients calculated with multivariate techniques, PCR and PLS-1, for 
ba2, channel 4 data results in much higher values than that calculated with least-squares 
regression. 

Calibration transfer 

The calibration transfer procedure is intended to overcome three major problems, 
which impede prediction performance of originally calculated regression equations. The 
first problem occurs when a calibration model developed on one instrument is transported to 
another instrument. A second problem is observed when the instrumental responses 
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measured on a single instrument over a period of time change for any reason (electronic 
drift). Finally, a third problem is caused by the differences between samples coming from 
different production batches. All these three problems involve a calibration on a primary 
instrument and an attempt to use the calibration model on a secondary instrument that 
produces responses that differ in some way. These problems have been encountered quite 
often in our experimental practice. To deal with them, several calibration transfer 
techniques were applied. To the best of our knowledge calibration transfer coupled with 
data decomposition techniques have never been applied previously for calibration transfer of 
any electrochemical data. The following techniques are presented below: Direct 
Standardization [19] using either raw data (DS) or scores (DSS), Piecewise Direct 
Standardization [19] using raw data (PDS) [19], Direct Standardization with Additive 
Background Correction [20] using either raw data (DSB) or scores (DSBS), Piecewise 
Direct Standardization with Additive Background Correction [20] using raw data (PDSB) 
[20]. These techniques are well described in literature, apart from DSS and DSBS. 
Therefore it has been decided to present the DSS and DSBS methods in detail. 

The procedure for DSS is as follows: 

- The original full calibration data set for primary instrument, X° (m, n) , is decomposed 
by PC A for scores, Si(m,a), and eigenvectors, Vi(n,a). The lower index "1" denotes 
primary instrument. 

- Scores Si and corresponding concentrations c°(m) are mean-centered (Equation 4) to 
obtain S! and c { , respectively. They constitute the following regression equation: 



(37) 



where >S(a,l) is a column vector of regression coefficients.. 

- The regression coefficients are calculated employing the expression: 




(38) 



where (Sj ) + is the pseudoinverse of matrix Sj calculated via the following equation: 




(39) 
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- The scores for the original calibration data subset for primary, X®' s (m s , n) , and 

secondary, (m s , n) , instrument are calculated using Equations 40 and 41, 
respectively: 

S^xJ^Vj (40) 

S^X^V, (41) 

The indexes: lower "2" and upper "s" denote a secondary instrument and calibration 
subset, respectively. 

- The transformation matrix is calculated as: 

F=(s|) + Sf (42) 

- Scores are calculated for a voltammogram obtained on the secondary instrument for an 
unknown sample, X2 ?u (l,n) employing the following expression: 

S 2) u =X2,uV, (43) 

- Scores for an unknown sample from a secondary instrument are multiplied by the 
transformation matrix: 

4,u=s 2 ,uF (44) 

f _f 

- The vector s 2 u is centered using the grand mean for primary calibration to obtain S2 )U . 

This vector is then used in the regression equation to obtain the mean-centered 
concentration of the unknown sample: 

C2,u=)&2 f ,u (45) 

- Finally, the mean centered concentration of the unknown sample from the secondary 
instrument is rescaled employing parameters from concentrations of the training set 
resulting in value of the predicted concentration: c 2 u . 



The initial five steps of the procedure for DSBS are identical to the initial steps of 
the procedure for DSS. However, before applying the regression equation, several 
additional coefficients should be calculated. The procedure for DSBS is as follows: 



- The original full calibration data set for primary instrument, X° (m, n) , is decomposed 
by PC A for scores, Si(m,a), and eigenvectors, Vi(n,a). The lower index "1" denotes 
primary instrument. 

- Scores Si and corresponding concentrations cP(m) are mean-centered (Equation 4) to 
obtain S'j and Cj , respectively. They constitute the regression Equation 37. 

- The regression coefficients are calculated employing Equation 38. 

- The scores for the original calibration data subset for primary, X° ,s (m s , n) , and 

secondary, X° ,s (m s ,n) , instrument are calculated using Equations 40 and 41, 
respectively. 

- The transformation matrix is calculated according to Equation 42. 

- An estimate of the regression vector for fiill calibration for the primary instrument is 
calculated employing the expression: 

b x =c^-^fi (46) 

where and sj 1 contain mean column values of Cj and Si, respectively, and is 
calculated using Equation 38, 

- The background vector b g (1, a) , introduced to accommodate the additive background 
difference between the instruments, is calculated by applying following equation: 
b g =s^-s^F (47) 

where vectors and contain mean column values of matrices S* and S| , 
respectively. 

- Scores are calculated for a voltammogram obtained on the secondary instrument for an 
unknown sample, x^y (1, n) employing Equation 43. 

- Scores for an unknown sample from a secondary instrument are multiplied by the 
transformation matrix (Equation 44). 

- Finally, the following equation is used to predict the concentration of the unknown 



-48- 



sample analyzed with the secondary instrument: 

c 2) u=(4,u+bg)^+b, (48) 

O A A /\ 

where s 2)U , b g , p and bj are computed from Equations 44, 47, 38 and 46, respectively. 

All calibration transfer techniques were implemented in the MATLAB environment. 

Procedures for DSS and DSBS were written following exactly the algorithms 
presented above. Remaining standardization procedures were implemented using the PLS 
Toolbox. 

TABLE 8. Squared regression coefficient, (r 0 ) 2 , (Equation 36) and PRESS (Equation 25) 
calculated using predicted acid (scan dq21b26, channel 3, range 3600-4350, 4 factors) 
concentrations for a secondary instrument employing regression equations obtained via 
various standardization methods: no standardization, DS, DSB, PDS, PDSB, DSS and 
DSBS. 



Method of standardization 


r 2 


PRESS 


ref. 


no standardization 


0.97592 


393020 


Fig. 9a 


DS 


0.96003 


1309.1 


Fig. 9b 


DSB 


0.96737 


930.5 


Fig. 9c 


PDS 


0.97592 


929 


Fig. 9d 


PDSB 


0.97613 


663.1 


Fig. 9e 


DSS 


0.9707 


1015.7 


Fig. 9f 


DSBS 


0.97756 


611.1 


Fig. 9g 



The performance of DS, DSB, PDS, PDSB, DSS and DSBS is compared in Figure 
1 1 and in Table 8 for the example of sulfuric acid calibration for PC 75 copper plating bath 
(Technic, Inc.). The regression equation calculated for the calibration of the primary 
instrument was used to predict concentration based on the voltammograms obtained on the 
secondary instrument. The predicted concentrations calculated using various calibration 
transfer techniques are presented in comparison to actual concentrations. In Figure 1 la, no 
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calibration transfer techniques were applied and the data from the secondary instrument was 
directly predicted with the primary instrument regression equation. Analyzing data in Table 
8, one can notice that squared correlation coefficient (Equation 36) is not a sufficient 
parameter for measuring the performance of calibration transfer techniques. Therefore, the 
performance was also analyzed based on PRESS (Equation 25) values. The techniques with 
additive background correction have resulted in lower values of PRESS than by 
corresponding techniques without additive background correction suggesting existence of a 
structured, nonlinear background. The DSS is performing more accurately and much faster 
than regular DS. DSBS gives the most accurate prediction. Although, the performances of 
PDSB and DSBS are very similar to each other, DSBS is preferred because of a much 
shorter time is required for its computations. 
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The present invention has been described in detail, including the preferred 
embodiments thereof. However, it will be appreciated that those skilled in the art, upon 
consideration of the present disclosure, may make modifications and/or improvements on 



-55- 



this invention and still be within the scope and spirit of this invention as set forth in the 
following claims. 
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